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Detailed Description Text - DETX (6) : 

The present invention identifies skin 
areas in video image sequences. Shape 
locator 50 initially locates one or more 
likely skin areas in a video frame 
based on the identification of edges of all 
objects in the video frame and a 
determination of whether any of such edges 
approximate the outline of a 
predetermined shape. The analysis of edges 
based on approximations to 
predetermined shapes is important because 
objects that are likely to contain 
skin areas have a high probability of being 
identified. For example, in some 
instances a person's face or head will 
nearly approximate the shape of an 
ellipse. Thus, the analysis of a video 
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frame to identify ellipses provides a 
high probability location of some skin 
areas . 



Detailed Description Text - DETX (32) : 

Typically, when a D. times. D block of 
pixels has an AC signal energy value 
that is less than 120,000 such a block of 
pixels is identified as a skin 
region. It is advantageous to utilize the 
signal energy components of the 
luminance parameter to determine skin 
areas, since non-skin areas tend to have 
much higher signal energy components than 
do skin areas. Identifying such 
non-skin areas and eliminating them from 
the color sampling process increases 
the probability that the color of a sampled 
pixel is actually a skin area 
pixel, and thus improves the accuracy of 
the range of tones to be sampled. 
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ABSTRACT 



An apparatus for detecting skin areas in video sequences is 
disclosed. The apparatus is configured to include a shape 
locator and a tone detector. The shape locator analyzes the^ 
input video sequences to identify the edges of all the objects 
in a video frame and determine whether such edges approxi- 
mate the outline of a predetermined shape that is likely to 
contain a skin area. Once objects likely to contain skin areas 
are located by the shape locator, the tone detector examines 
the picture elements (pixels) of each located object to 
determine if such pixels have signal energies that are char- 
acteristic of skin areas. The tone detector then samples 
pixels that have signal energies which are characteristic of 
skin areas to determine a range of skin tones and compares 
the range of sampled skin tones with the tones in the entire 
frame to find all matching skin tones. An eyes-nose-mouth 
(ENM) region detector is optionally incorporated between 
the shape locator and the tone detector to identify the 
location of an ENM region on an object that is likely to be 
a face, so as to improve the accuracy of the range of skin 
tones that are sampled by the tone detector. 

43 Claims, 3 Drawing Sheets 
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SKIN AREA DETECTION FOR VIDEO 
IMAGE SYSTEMS 

FIELD OF THE INVENTION 

The present invention to a low bit-rate communication 
system for multimedia applications, such as a video tele- 
conferencing system, and more particularly, to a method of, 
and system for, identifying skin areas in video images. 

Description of the Related Art 

The storage and transmission of full-color, full-motion 
images is increasingly in demand. These images are used, 
not only for entertainment, as in motion picture or television 
productions, but also for analytical and diagnostic tasks such 
as engineering analysis and medical imaging. 

There are several advantages to providing these images in 
digital form. For example, digital images are more suscep- 
tible to enhancement and manipulation. Also, digital video 
images can be regenerated accurately over several genera- 
tions with only minimal signal degradation. 

On the other hand, digital video requires significant 
memory capacity for storage and equivalently, it requires a 
high-bandwidth channel for transmission. For example, a 
single 512 by 512 pixel gray-scale image with 256 gray 
levels requires more than 256,000 bytes of storage. A full 
color image requires nearly 800,000 bytes. Natural-looking 
motion requires that images be updated at least 30 times per 
second. A transmission channel for natural-looking full color 
moving images must therefore accommodate approximately 
190 million bits per second. However, modern digital com- 
munication applications, including videophones, set-top- 
boxes for video-on-demand, and video teleconferencing 
systems have transmission channels with bandwidth 
limitations, so that the number of bits available for trans- 
mitting video image information is less than 190 million bits 
per second. 

As a result, a number of image compression techniques 
such as, for example, discrete cosine transformation (DCT) 
have been used to reduce the information capacity required 
for the storage and transmission of digital video signals. 
These techniques generally take advantage of the consider- 
able redundancy in any natural image, so as to reduce the 
amount of data used to transmit, record, and reproduce the 
digital video images. For example, if the video image to be 
transmitted is an image of the sky on a clear day, the discrete 
cosine transform (DCT) image data information has many 
zero data components since there is little or no variation in 
the objects depicted for such an image. Thus, the image 
information of the sky on a clear day is compressed by 
transmitting only the small number of non-zero data com- 
ponents. 

One problem associated with image compression 
techniques, such as discrete cosine transformation (DCT) is 
that they produce lossy images, since only partial image 
information is transmitted in order to reduce the bit rate. A 
lossy image is a video image which contains distortions in 
the objects depicted, when the decoded image content is 
compared with the original image content. Since most video 
teleconferencing or telephony applications are focused 
toward images containing persons rather than scenery, the 
ability to transmit video images without distortions is impor- 
tant. This is because a viewer will tend to focus his or her 
attention toward specific features (objects) contained in the 
video sequences such as the faces, hands or other skin areas 
of the persons in the scene, instead of toward items, such as, 
for example, clothing and background scenery. 
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In some situations, a very good rendition of facial features 
contained in a video sequence is paramount to intelligibility, 
such as in the case of hearing-impaired viewers who may 
rely on lip reading. For such an application, decoded video 

5 image sequences which contain distorted facial regions can 
be annoying to a viewer, since such image sequences are 
often depicted with overly smoothed-out facial features, 
giving the faces an artificial quality. For example, fine facial 
features such as wrinkles that are present on faces found in 

1Q an original video image tend to be erased in a decoded 
version of a compressed and transmitted video image, thus 
hampering the viewing of the video image. 

Several techniques for reducing distortions in skin areas 
of images that are transmitted have focused on extracting 

15 qualitative information about the content of the video 
images including faces, hands and the other skin areas of the 
persons in the scene, in order to code such identified areas 
using fewer data compression components. Thus, these 
identified areas are coded and transmitted using a larger 

2Q number of bits per second, so that such areas contain fewer 
distorted features when the video images are decoded. 

In one technique, a sequence of video images is searched 
for symmetric shapes. A symmetric shape is defined as a 
shape which is divisible into identical halves about an axis 

25 of symmetry. An axis of symmetry is a line segment which 
divides an object into equal parts. Examples of symmetrical 
shapes include squares, circles and ellipses. If the objects in 
a video image are searched for symmetrical shapes, some of 
the faces and heads shown in the video image are identifi- 

30 able. Faces and heads that are depicted symmetrically, 
typically approximate the shape of an ellipse and have an 
axis of symmetry vertically positioned between the eyes, 
through the center of the nose and halfway across the mouth. 
Each half-ellipse is symmetric because each contains one 

35 eye, half of the nose and half of the mouth. However, only 
those faces and heads that are symmetrically depicted in the 
video image are recognizable, precluding the identification 
of heads and faces when viewed in profile (turned to the left 
or turned to the right), since a face or head viewed in profile 

40 does not contain an axis of symmetry. Hands and other skin 
areas of the persons in the scene are similarly not symmetric 
objects and are also not recognizable using a sym metry ^ 
based technique. 

Another technique, searches the video images for specific 

45 geometric shapes such as, for example, ellipses, rectangles 
or triangles. Searching the video images for specific geo- 
metric shapes can often locate heads and faces, but still 
cannot identify hands and other skin areas of persons in the 
scene, since such areas are typically not represented by a 

50 specified geometric shape. Additionally, partially obstructed 
faces and heads which do not approximate a specified 
geometric shape are similarly not recognizable. 

In yet another technique, a sequence of video images is 
searched using color (hue) to identify skin areas including 

55 beads, faces and hands. Color (hue) based identification is 
dependent upon using a set of specified skin tones to search 
the video sequences for objects which have matching skin 
colors. While the color (hue) based techniques are useful to 
identify some hands, faces or other skin areas of a scene, 

60 many other such areas can not be identified since not all 
persons have the same skin tone. In addition, color varia- 
tions in many skin areas of the video sequences will also not 
be detectable. This is because the use of a set of specified 
skin tones to search for matching skin areas precludes color 

65 based techniques from compensating for unpredictable 
changes to the color of an object, such as variations attrib- 
utable to background lighting and/or shading. 
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Accordingly, skin identification techniques that identify represent the outline of an object, approximate a shape that 
hands, faces and other skin areas of persons in a scene is likely to contain a skin area. Since skin areas are usually 
continue to be sought. defined by the softer curves of human shapes (e.g., the nape 

SUMMARY OF THE INVENTION °^ ,ne Dec ^' ""d tne curve °f chin), rigid angular borders 

„ , . , 5 are not typically indicative of skin areas. Thus, configura- 

Tne present invention is directed to a skin area detector ^ ^ m ataadMa ^ ^fto human shapes are 

for identifying skin areas in video images and I in an illus- xleaed ^ likely t0 coniain skin areas. For example, 

trative application, is used in conjunction with the video since „ elli approximates the shape of a person's face or 

coder of video encoding/decoding (Codec) equipment. The headj ^ ^ ^ of a video to identif y lhose 

skin area detector identifies skm areas in video frames by w outlines of objects yMA approximate ellipses> advanla _ 

initially analyzing the shape of all the objects in a video , determines some locations in the video sequence 

sequence to locate one or more objects that are likely to mat m ^ to ^ arcas Also> m me ^xt of 

contain skin areas. Objects that are likely to contain skin vjdco at leasl one & typicallv facing 

areas are further analyzed to determine if the picture ele- ^ camera> n tf one of more „. ^ in ^ room> men 

ments (pixels) of any such object or objects have signa M ft ^ ^ ^ ^ e , u ica , sfa ^ ^ identifiecL 

energies characteristic of skin regions. The term signal ^ , „ , ... , , . t 

Energy as used herein refers to the sum of the squares of the ° nce objects likely to contain skm areas are located by the 

) luminance (brightness) parameter for a specified group of sb *P° ' ocator ton ° , detector «"»««* •«"» P"*"? e,e ; 

pixels in the video signal. The signal energy includes two ™ en J s G" xels > of , each ,ocated , ***** '° determine * ™ ch 

components: a direct current (DC) signal energy and an _ P aels *™ that charactensUc of skui 

alternaUngcurrent(AC)signaIener B y.Thecolor P arameters 20 T^V ^ TfS 

of objects with picture elements (pixels) that have signal Aed objects and compares fihe range of sampled skm tones 

energies characteristic of skin regions are then sampled to 7 th ^ ton f ™ me entire frame f 0 d *™<> . aU matching 

determine a range of skin tone values for the object This ^ toDes - '° * e P™*? embodiment, the signal energy 

range of sampled skin tone values for the analyzed object is „ components (DC and AC energy components) of the lumi- 

then compared with all the tones contained in the video 25 ° ance V™** are advantageov^deterrmned using the 

:™™ c« « :A an <:r,r rtt u„ ;« ,„\4~« discrete cosine transformation (DCT) technique, 

image, so as to identify other areas in tne video sequence v 7 n 

having me same skm tone values. The identification of likely In the technique of the present invention, the discrete 

skin regions in objects based on shape analysis and a cosine transform (DCT) of the signal energy for a specified 

determinauon of the signal energies characteristic of skin ^ g^P of pixels in an object identified as likely to contain a 

regions is advantageous. This is because the subsequent skm area is calculated. Thereafter, theAC energy component 

color sampling of such identified objects to determine a ot eacn P™ 1 is determined by subtracting the DC energy 

range of skin tone values, automatically compensates for component for each pixel from the discrete cosine transform 

color variations in the object and thus skin detection is made (DCT), Based on the value of the AC energy component for 

dynamic with respect to the content of a video sequence. 5 each P ixe l a determinauon is made as to whether the pixels 

In the present illustrative example, the skin area detector have ™ AC si ^ 1 ener gy characteristic of a skin area. If the 

is integrated with but functions independently of the other AC ener gy for ™ examined pixel is less than a 

component parts of the video encoding/decoding (Codec) specified value, typically such pixels are identified as skin 

equipment which includes an encoder, a decoder and a P mc]s - Thereafter, the tone detector samples the color 

coding controller. In one embodiment, the skin area detector ^ parameters of such identified pixels and determines a range 

is inserted between the input video signal and the coding of color parameters Mcative of skm tone mat are contained 

controller, to provide input related to the location of skin me TC & on of ^ 

areas in video sequences, prior to the encoding of the video The color parameters sampled by the tone detector are 

- images. advantageously chrominance parameters, C r and C b . The 

In one example of the present invention, the skin area 45 lerm chrominance parameters as used herein refers to the 

detector includes a shape locator and a tone detector. The col° r difference values of the video signal, wherein C r is 

shape locator analyzes input video sequences to identify the defined as the difference between the red color component 

X edges of all the objects in a video frame and determine and the luminance parameter (Y) of the video signal and C b 

whether such edges approximate the outline of a shape that is defined as the difference between the blue color compo- 

is likely to contain a skin area. The shape locator is advan- 50 nent and ^ luminance (Y) parameter of the video signal, 

tageously programmed to identify certain shapes that are The tone detector subsequently compares the range of 

likely to contain skin areas. For example, since human faces identified skin tone values from the sampled object with the 

have a shape that is approximately elliptical, the shape color parameters of the rest of the video frame to identify ' 

locator is programmed to search for elliptically shaped other skin areas. 

objects in the video signal. 55 The skin area detector of the present invention thereafter 

Since an entire video frame is too large to analyze analyzes the next frame of the video sequence to determine I 

globally, it is advantageous if the video frame of an input the range of skin tone values and identify skin areas in the — 

video sequence is first partitioned into image areas. For each next video frame. The skin area detector optionally uses the 

image area, the edges of objects are then determined based range of skin tone values identified in one frame of a video 

on changes in the magnitude of the pixel (picture element) 60 sequence to identify skin areas in subsequent frames of the 

intensities for adjacent pixels. If the changes in the magni- video sequence. 

tude of the pixel intensities for adjacent pixels in each image The skin area detector optionally includes an eyes-nose- 
area are larger then a specified magnitude, the location of mouth (ENM) region detector for analyzing some objects 
such an image area is identified as containing an edge or a which approximate the shape of a person's face or head, to 
portion of the edge of an object 65 determine the location of an eyes-nose-mouth (ENM) 
Thereafter, identified edges or a portion of identified region. In one embodiment, the ENM region detector is 
edges are further analyzed to determine if such edges, which inserted between the shape locator and the tone detector to 
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identify the location of an ENM region and use such a region parts such as, video coder 22, video decoder 24 and coding 

as a basis for analysis by the tone detector. The eyes-nose- controller 16. Such component parts will be discussed in 

mouth (ENM) region detector utilizes symmetry based conjunction with the following explanation of the operation 

methods to identify an ENM region located within an object of video codec 10. 

which approximates the shape of a person's face or head. It 5 Skin area detector 12, shown in greater detail in the block 

is advantageous for the eyes-nose-mouth (ENM) region to diagram of FIG. 2, includes the ocator 50 and a tone detector 

be identified since such a region of the face contains skin 56 - Th c factions represented by the shape locator 50 and 

color parameters as well as color parameters other than skin tone detector 56 are optionally provided through the use of 

tone parameters, including for example, eye color either shared or dedicated hardware, including hardware 

parameters, eyebrow color parameters, lip color parameters 10 ***** of execuUng software. For example, the functions of 

and hair color parameters. Also, the identification of the sh ^ e locator 50 and tone detector 56 are optionally pro- 

eyc-nose-mouth (ENM) region reduces computational ™^JV shared P rocessor or b X a P lurallt y of 

complexity, since skin tone parameters are sampled from a mdmdual P rocessors * 

small region of the identified object. the ™* of me individual functional blocks repre- 

u* * j r *. c .u . * ** mi m senting shape locator 50 and tone detector 56 is not to be 

Other objects and features of the present invention will « . j * *• i-i.ua. uir 

, J - iL - . , t , , . t . construed to refer exclusively to hardware capable of 

become apparent from the following detailed description A . Ca „ * * JJV i ■« * «• 

A K- . **u *u * a * executmg software. Examples of additional illustrative 

considered in conjunction with the accompanying drawings. u • * *u * iuii j , , 

fi . iL j .jl J - embodiments for the functional blocks described above, 

It is to be understood, however, that the drawings are . . , . . . . , , , ' 

, . , , , r f ii * ** . °T mclude digital signal processor (DSP) hardware, such as the 

designed solely for purposes of illustration and not as a ^or«^^ \ , /n mjlN c 

i ru- r.u i- r.u ■ *■ r u- u r ->n AT&T DSP16 or DSP32C, read-only memory (ROM) for 

definition of me limits of the invention, for which reference 20 4 • _r • *u *.* j- ju t 

, , JiiL , , . . storing software performing the operations discussed below, 

should be made to the appended claims. and r * dom access memory (RAM) for storing digital signal 

BRIEF DESCRIPTION OF THE DRAWINGS processor (DSP) results. Very large scale integration (VLSI) 

hardware embodiments, as well as custom VLSI circuitry in 

FIG. 1 is a block diagram of a video coder/decoder combination with a general purpose digital signal processor 

(Codec) embodying an illustrative application of the prin- (£) S p) ^ &]so optionally contemplated. Any and/or 

ciples of the present invention; all sucn embodiments are deemed to fall within the meaning 

FIG. 2 is a block diagram of the skin area detector of the of the functional blocks labeled shape locator 50 and tone 

present invention; detector 56. 

FIG. 3 shows a block diagram of the shape locator of FIG. 30 The present invention identifies skin areas in video image 

2; sequences. Shape locator 50 initially locates one or more <_> \ 

FIG. 4 is a block diagram of the preprocessor circuit of the likely skin areas in a video frame based on the identification \ 

shape locator of FIG 3* of edges of all objects in the video frame and a determination \ 

FIG. 5 shows a block diagram of the lone detector of FIG. of ^, eth ^ ia J ° f such ,!*■« a PP™*f a^ 

2. 35 predetermined shape. The analysis of edges based on 

' . approximations to predetermined shapes is important 

FIG. 6 illustrates a 4x4 block of pixels; because objects that are likely to contain skin areas have a 

FIG. 7 shows a block diagram of the skin area detector high probability of being identified. For example, in some 

including an eyes-nose-mouth (ENM) region detector; and instances a person's face or head will nearly approximate the 

FIG. 8 illustrates a rectangular window located within an ^ shape of an ellipse. Thus, the analysis of a video frame to 

ellipse . identify ellipses provides a high probability location of some 

skin areas. 

DETAILED DESCRIPTION Objects identified as likely skin areas are thereafter ana- 
FIG. 1 shows an illustrative application of the present lyzed by tone detector 56 to determine whether the picture 
invention wherein a skin area detector 12 is used in con- 45 elements (pixels) of any such object or objects have signal 
junction with a video coding/decoding system such as, for energies characteristic of skin regions. The term signal 
example, video codec 10 (coder/decoder). Video coding/ energy as used in this disclosure refers to the sum of the 
decoding systems such as video codec 10 are utilized squares of the luminance (brightness) parameter for a sped- 
primarily in the teleconferencing industry for the coding and fied group of pixels in the video signal and includes two- 
decoding of video image sequences based on image com- 50 energy-components: a direct current (DC) signal energy and 
pression techniques. An example of an image compression an alternating current(AC) signal energy. The color param- 
technique useful for the coding and decoding of videoimage eters of objects with picture elements (pixels) that have 
sequences includes the Discrete Cosine Transform (DCT) signal energies characteristic of skin regions are then 
method, described in ITU-T Recommendation H.263 sampled to determine a range of skin tone (color) values for 
("Video coding for narrow communication channels*'). It 55 the object. The range of skin tone values for the object are 
should be understood, of course, that the present invention then compared with all the tones contained in the video 
is useful with video systems other than a video coder/ image, so as to identify other areas in the video sequence 
decoder (codec), such as, for example motion picture editing having the same skin tone values. When skin areas are 
equipment. Indeed, the present invention is applicable for identified based on an analysis of signal energies, followed 
use with any equipment to which a digital color video signal 60 by the sampling of skin tone values, skin detection is made 
is input. dynamic with respect to the content of the video sequence, 
One embodiment of the present invention is illustrated in because the skin tone sampling of identified objects auto- 
FIG. 1, which shows skin area detector 12 (enclosed with matically compensates for unpredictable changes to the 
dashed lines), located within video codec 10. Skin area color tones of an object, such as variations attributable to 
detector 12 is integrated with, but functions independently 65 background lighting and/or shading, 
from, the other component parts of video codec 10. For The component parts of both shape locator 50 and tone 
example, video codec 10 includes additional component detector 56 are described below, with reference to FIG. 2, as 
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part of an explanation of the operation of skin area detector 
12. An input video signal 26 representing a sequence of 
frames corresponding to an image of an object as a function 
of time, is provided to shape locator 50 from a conventional 
video camera (not shown) such as, for example, the View 
Cam, manufactured by Sharp Corporation. Shape locator SO 
analyzes at least one of the frames of the input video signal 
26 to identify the edges of all the objects in the frame and 
determine if an edge or a portion of an edge approximates a 
shape that is likely to include a skin area. Examples of 
shapes that are likely to include skin areas include ellipses, 
arcs and curves. The term curve as used in this disclosure 
refers to a shape having at least a portion of an edge that is 
not a straight line. 

The component parts of shape locator 50 are illustrated in 
FIG. 3 and include a shape location preprocessor 94 as well 
as a coarse scanner 100, a fine scanner 102 and a shape fitter 
104. The shape fitter 104 generates a shape locator signal 
106, which is provided to the tone detector 56. 

The shape location preprocessor 94 functions to analyze 20 
the regions of the video image to identify the edges of 
objects contained in the video frame. The shape location 
preprocessor 94 incorporates a preprocessing circuit, as 
illustrated in FIG. 4, including a downsampler 118, a filter 
120, a decimator 122, an edge detector 124 and a thresh- 25 
olding circuit 126. 

Temporal downsampler 118 functions to limit the number 
of frames of the video signal that are available for shape 
identification by selecting, for analysis, only a small number 
of frames from the total number of frames available in the 
input video signal 26. As an illustrative example, a typical 
frame rate for a video signal such as input video signal 26 
approximates 30 frames per second (fps), with each succes- 
sive frame containing information essentially identical to 
that of the previous frame. Since successive frames contain 
essentially identical information, it is advantageous to 
reduce computational complexity by selecting only a small 
number of frames from the video signal for shape analysis. 
Thus, regarding the present example, assume that the 
downsampler, in order to reduce computational complexity, 40 
selects only every fourth frame of the input video signal for 
shape analysis. As a result, the downsampler reduces the 
frame rate input into shape locator 50, from a rate of about 
30 frames per second (fps) to a rate of about 15 frames per 
second (fps). 45 

Filter 120 is typically a separable filter for performing 
spatial filtering of a downsampled video frame, having a size 
360x240 pixels and with a cut-off frequency of n/c, where 
c is advantageously equivalent to the decimation (division) 
factor, discussed below. Typically, a filter such as filter 120 50 
defines a range of frequencies. When a signal such as 
downsampled input video signal 26 is provided to filter 120, 
only those frequencies contained in the video signal that are 
within the range of defined frequencies for the filter are 
output. The frequencies contained in the video signal that are 55 
outside the range of defined frequencies for the filter, are 
suppressed. Examples of filter 120 include finite impulse 
response (FIR) filters and infinite impulse response (IIR) 
filters. The filtered video signal is input to decimator 122 
where both the horizontal and vertical dimensions of the 
video image frame are partitioned into image areas having 
smaller predetermined sizes, for edge analysis. As an illus- 
trative example, if a decimator such as decimator 122 has a 
decimation factor of c=8, and the video image frame has 
dimensions of 360x240 pixels, then the video image frame 
is partitioned into image areas with dimensions of 45x30 
pixels. 
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Edge detector 124 performs edge detection on each of the 
partitioned image areas of the video image frame, searching 
for the edges of objects. The edge of an object in any video 
image frame is typically characterized by changes in the 
magnitude of the pixel intensities for adjacent pixels. For 
example, if an image area of size 3x3 pixels does not contain 
an edge of an object, the magnitude of the pixel intensities 
for adjacent pixels, representative of such an image area, are 
nearly equivalent, as shown in matrix A, 



11 10 10 
10 10 10 
10 10 11 



In contrast, if a similar image area of size 3x3 pixels, 
contains the edge of an object, the magnitudes of the pixel 
intensities for adjacent pixels, representative of such an 
image area, contain sharp transitions, as shown in matrix B, 



10 50 90 
50 50 90 
90 90 90 



Edge detectors such as, edge detector 124, utilize techniques 
including, Sobel operator techniques, to identify the edges of 
objects by summing the squares of the convolution of 
two-dimensional Sobel operators such as, b x and 6 y , with the 
magnitudes of the pixel intensities for adjacent pixels, 
shown for example, in either matrix A or B, for a partitioned 
image area. As an illustrative example, using the Sobel 
operator technique, if the Sobel operators, represented in 
two dimensional form by the horizontal d x and vertical b y 
operators, described below, 
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are convolved with the magnitudes of the pixel intensities 
for adjacent pixels in an image area that does not contain the 
edge of an object such as, for example, the pixel intensities 
for adjacent pixels of matrix A, as shown below: 
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the resulting convolution produces, in part, the result shown 
below, 

a rf K-ixiiHOxio>+{i><io>K-2xio)+(Oxio>+(2xio)+<-ixio)+ 

(0xl0)+(l*l0*0 

^^lxll>4<2xlOH lxlo )H^10>+(O x 10>+<0 )< 10)+{-lxlO)+(- 2x 
10>+<- lx 31>»0 

whose magnitudes approximate zero in two dimensions. In 
contrast, if the Sobel operators are convolved with the 
magnitudes of pixel intensities for adjacent pixels in an 
image area that contains the edge of an object such as, for 
example, the magnitudes of the pixel intensities for adjacent 
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pixels, shown in matrix B, the resulting convolution energies that are characteristic of skin regions the tone 

produces, in part, the result shown below, detector 56 samples the color parameters of the object, in 

* / , ,„n ,^ M „ M , „ ,„ M . , , M . order to identify a range of skin tone values. The tone 

(Ox90>+<ix90>»i6O detector 56 then compares the identified range of skin tone 

5 values to the color parameters of the rest of the video frame 

^W-i^^ 1 -90K0x50M0x50HOx9OH-i x90H-2x tQ identify other ^ lbe sa me skin tone values, 

x )*- Color digital video signals contain red (R), green (G) and 

whose magnitudes do not approximate zero. Edge detection blue (B) color components and are typically available in a 

techniques utilizing, for example, the above described Sobel standard YUV color video format, where Y represents the 

operator techniques, are performed for each of the parti- to luminance parameter and both U and V represent the 

tioned 45x30 pixel areas of the video frame. chrominance parameters. The luminance (Y) parameter 

Thresholding circuit 126 then identifies those pixels in characterizes the brightness of the video image, while the 

each 45x30 partitioned area, whose magnitude of chrominance (U,V) parameters define two color difference 

convolved, squared and summed pixel intensities for adja- values, C r and C b . The relationships between the luminance 

cent pixels are larger than a specified value, assigning such 15 (Y) parameter, the color difference values, C r and C b , and 

identified pixels a non-zero numerical value. Pixels having the three color components R, G and B are typically 

a magnitude of convolved, squared and summed pixel expressed as: 
intensities for adjacent pixels less than the specified value of 

the thresholding circuit 126, are assigned a zero numerical y-o.299R+0.587G-0.ll4B 
value. Edge data signals 128 corresponding to the non-zero 20 c^r y 
pixel values are subsequently generated by the thresholding ' 
circuit 126. The incorporation of a thresholding circuit, such c b -B-Y 
as, thresholding circuit 126, advantageously prevents con- 
toured skin areas that are not edges from being misidentified In one embodiment of the present invention, tone detector 
as edges. This is because small variations in the magnitudes 25 56, as shown in FIG. 5, includes a skin region detector 200, 
of the pixel intensities for adjacent pixels typically produces a C r histogram generator 201, a C b histogram generator 203, 
convolved, squared and summed magnitudes that are less a C r range detector 205, a C b range detector 207 and a tone 
than the specified value of the thresholding circuit 126. comparator 209. 

Referring again to FIG. 3, the edge data signals 128 Skin region detector 200 correlates the input video signal 

generated by the shape location preprocessor 94 are input to 30 26 with the shape location signal 106, so that the objects 

the coarse scanner 100 of shape locator 50. The coarse identified in the video frame, by the shape locator 50 are 

scanner 100 segments the edge data signals 128 provided by segmented into blocks of DxD pixels. Skin region detector 

the shape location preprocessor 94, into blocks of size BxB 200 advantageously segments the identified shape into 

pixels; for example, of size 5x5 pixels. Each block is then blocks of 2x2 pixels, where D=2, in order to obtain one 

marked by the coarse scanner 100, if at least one of the 35 luminance parameter for each pixel as well as one C r value 

pixels in the block has a non-zero value, as discussed above. and one C b value for every block of 2x2 pixels. As an 

The array of segmented BxB blocks is then scanned in for illustrative example, FIG. 6 shows a 4x4 block of pixels 300. 

example, a left-to-right, top-to-bottom fashion, searching for A luminance parameter (Y) 301 is present for each pixel 

contiguous runs of marked blocks. For each such run of 300. In contrast, each block of 2x2 pixels 300 has one C r 

marked blocks, fine scanning and shape fitting are per- 40 value 302 and one C b value 303, which is present at the Vi 

formed. The inclusion of coarse scanner 100 as a component dimension in both the horizontal and vertical directions, 

part of shape locator 50 is optional depending on the Thus, each block of 2x2 pixels includes four luminance (Y) 

computational complexity of the system utilized. The fine parameters 301, as well as one C r value 302 and one C b 

scanner 102 scans the pixels in each contiguous run of value 303. Such segmentation, to include only one C r value 

segmented and marked BxB blocks, for example, in a 45 and only one C b value is important when skin tone sampling 

left-to-right, top-to-bottom fashion, to detect the first pixel in is performed for an identified object, as discussed below, 

each line of pixels that has a non-zero value and the last pixel Skin region detector 200 functions to analyze which of the 

in each line of pixels that has a non-zero value. The first and blocks of DxD pixels lying within the perimeter of an 

last non-zero detected pixels of each line are labeled with identified object represents skin areas by determining 

coordinates (x JMm y) and (x end , y), respectively. 50 whether each DxD block of pixels have signal energies 

The shape fitter 104 scans the coordinates labeled (x^^ characteristic of a skin region. The luminance (Y) parameter 

y) and (x^,, y) on each line of pixels. Geometric shapes of of the color video signal has two signal energy components: 

various sizes and aspect ratios stored in the memory of the an alternating current (AC) energy component and a direct 

shape fitter 104 that are likely to contain skin areas are then current (DC) energy component. Skin area pixels typically 

compared to the labeled coordinate areas, in order to deter- 55 have AC energy components with values less than a speci- 

mine approximate shape matches. Having determined a fied threshold energy, T CT . 

shape outline from a well fitting match of a predetermined In an embodiment of the present invention, skin areas are 

shape that is likely to contain a skin area such as, for detected based on the calculation of the AC energy compo- 

example, an ellipse, the shape locator 50 generates a shape nents for the luminance (Y) parameter of the color video 

location signal 106 based on the coordinates of the well- 60 signal. Methods including the discrete cosine transformation 

fitted shape, and provides such a shape location signal 106 (DCT) technique, as described in ITU-T Recommendation 

to the tone detector 56. H.263 ("Video coding for narrow communication 

Once shape locator 50 has identified the location of an channels") are useful for calculating the signal energies of 

object with a border that indicates the object is likely to the luminance (Y) parameter. As an illustrative example, the 

contain a skin area, tone detector 56 functions to analyze 65 AC energy components and the DC energy components of 

whether such an object contains signal energies that are the luminance parameters for each block of DxD pixels, is 

characteristic of skin regions. If the object contains signal determined by first calculating the discrete cosine transfer- 
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mation (DCT) function, F(u, v) for each pixel as shown 
below, from equation (1) 

where F(u, v) represents the discrete cosine transformation 
(DCT) function and C(u) and C(v) are defined as 

C\(j)= 1 /VT forw=0 

C(w)= 1 for w= 1,2, 3. ... 

which are summed for each pixel location F(u,v) of the 
block of DxD pixels. The AC signal energy, E(m,l), is then 
determined by subtracting the square of the direct current 
(DC) signal energy, F m l (0,0), from the square of the 
discrete cosine transformation function F(u, v), as shown in 
equation (2) 

i i (2) 



The AC signal energy, E (m, 1), is then compared to a 
threshold energy, T CT . For each DxD block of pixels, if the 
AC signal energy, E(m, 1), is less than a preselected thresh- 
old energy, T w the block of pixels is identified as a skin 
area, as indicated below, 

E(m, 1)<T„ Skin area 
E(m, 1)^7^ Non-skin area 

Typically, when a DxD block of pixels has an AC signal 
energy value that is less than 120,000 such a block of pixels 
is identified as a skin region. It is advantageous to utilize the 
signal energy components of the luminance parameter to 
determine skin areas, since non-skin areas tend to have much 
higher signal energy components than do skin areas. Iden- 
tifying such non-skin areas and eliminating them from the 
color sampling process increases the probability that the 
color of a sampled pixel is actually a skin area pixel, and 
thus improves the accuracy of the range of tones to be 
sampled. 

Once a block of DxD pixels has been identified by the 
skin region detector 200, as a skin region, the C r values and 
the C b values of the block of DxD pixels are sampled by the 
C r histogram generator 201 and the C b histogram generator 
203, respectively. As previously discussed, it is advanta- 
geous if the blocks of DxD pixels, are 2x2 blocks of pixels, 
since such blocks contain only one C r value and one C b 
value. Both the C r histogram generator 201 and the C b 
histogram generator 203 then generate histograms for the 
sampled C r and C b values, respectively. 

Once a C r histogram and a C b histogram have been 
generated, the range of color parameters representative of 
skin tone for the sampled object are determined by the C r 
range detector 205 and the C b range detector 207 using 
statistical analysis techniques. For example, with each data 
set the mean and mode C r and C b values are determined for 
each block of DxD pixels sampled. When the mean and 
mode C r and C b values are within some specified distance, 
D p , of each other, such mean and mode C r and C b values are 
identified as representing a single peak. Thereafter, for each 
block of DxD pixels, if a pixel color parameter is within a 
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predetermined distance, for example, one standard 
deviation, of such mean and mode C r and C b values repre- 
sentative of a single peak, than the pixel color parameter is 
included in the range of skin tone values. When the mean 

5 and mode are within a distance greater than the specified 
distance, Dp, such mean and mode C r and C b values are 
identified as representing two individual peaks. The pixel 
color parameters for blocks of DxD pixels with mean and 
mode C r and C b values that are representative of two 

10 individual peaks are not included in the range of skin tone 
values. 

Based on the range of C r and C b values generated in the 
C r range detector 205 and the C b range detector 207, 
respectively, the tone comparator 209 analyzes the entire 

15 frame of the input video signal 26, to locate all other areas 
containing the same chrominance parameters. When such 
other regions are located, a skin information signal 211 
denoting the location of the skin areas is generated by the 
tone comparator 209. 

20 Skin area detector 12 performs the above described analy- 
sis for each frame of a video sequence or optionally analyzes 
a single frame and then the tone comparator 209 utilizes that 
range of skin tone values to identify skin areas in a specified 
number of subsequent frames. 

25 In the embodiment of the present invention, wherein the 
outline of an object or objects identified by the shape locator 
50 match well fitting ellipses and before such a shape or 
shapes have been verified to contain skin areas, a shape 
location signal 106 generated by the shape locator 50 is 

30 optionally provided to an eyes-nose-mouth (ENM) region 
detector 52, as shown in FIG. 7. The ENM region detector 
52 receives the coordinates of the well-fitted elliptical out- 
lines from the shape locator 50 and segments the elliptical 
region into a rectangular window 60 and a compliment area 

35 62 (containing the remainder of the ellipse not located 
within rectangular window 60), as shown in FIG. 8. The 
ENM region detector 52 receives the elliptical parameters 
and processes them such that a rectangular window 60 is 
positioned to capture the region of the ellipse corresponding 

40 to the eyes, nose and mouth region. 

The ENM region detector 52 determines a search region 
for locating rectangular window 60 using the search region 
identifier 108, where the coordinates of the center point (Xq, 
y 0 ) of the elliptical outline as shown in FIG. 8 are used to 

45 obtain estimates for the positioning of the rectangular win- 
dow 60. The search region for locating the center point of the 
ENM region is a rectangle of size SxT pixels such as, for 
example, 12x15 pixels, and is advantageously chosen to 
have a fixed size relative to the major and minor axes of the 

50 elliptical shape outline. The term major axis as used in this 
disclosure is defined with reference to FIG. 8 and refers to 
the line segment bisecting the ellipse between points y 2 and 
y 2 . The term minor axis as used in this disclosure is also 
defined with respect to FIG. 8 and refers to the line segment 

55 bisecting the ellipse between points x 1 and x 2 . As an 
illustrative example, assume the ellipse has a length along 
the major axis of 50 pixels and a length along the minor axis 
of 30 pixels. The size of the rectangular window 60 is 
advantageously chosen to have a size of 25x15 pixels, which 

60 approximates half the length of the ellipse along both the 
major and minor axes and captures the most probable 
location of the eyes-nose-mouth region of the shape. 

Once rectangular window 60 is located within the ellipse, 
the search region scanner 110 analyzes the rectangular 

65 window to determine each candidate position for an axis of 
symmetry with respect to the eyes-nose-mouth region of the 
ellipse. For example, search region scanner 110, in a left- 
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to-right fashion, selects each vertical row of pixels within values are above the threshold value of 8. However, in the 

rectangular window 60 using a line segment 64 placed embodiment of the present invention, the data components 

parallel to the major axis, in order to search for an axis of having values below the threshold value, for portions of the 

symmetry, positioned between the eyes, through the center ^deo signal that are identified as containing skin areas, now 

of the nose and halfway through the mouth. After the axis of 5 « encoded along with the data components having values 

symmetry is determined with respect to the facial axis, the abovc me threshold value. As a result, the areas of the video 

ENM region detector 52 generates an ENM region signal 54 lhat are identified as skin areas are encoded with a 

corresponding to the coordinates of the resulting eyes-nose- hl & ner number of blts lhan MGas mat arc 001 50 identified. In 

mouth region of the rectangular window 60. The ENM one embodiment, the video coder 22 encodes the input video 

signal 54 notifies the tone detector 56 of the coordinates for 10 si &** 1 26 wia & a s™™* coder 32 > a ^deo multiplex coder 

the location of the eyes, nose, and mouth region of the object a transmission buffer 36, and a transmission coder 58 to 

so that pixels not included in such region are excluded from generate the output coded bitstream 30. 

subsequent color parameter analysis. It is advantageous for For decoding operations, the video codec 10 receives an 

the eyes-nose-mouth region to be identified since such a wput ^ed bitstream 40. Hie video decoder 24 decodes the 

region of the face contains skin color parameters as well as 15 m P ut bitstream 40 using a receiving decoder 42, a 

color parameters other than skin tone parameters, including receiving buffer 44, a video multiplex decoder 46, and a 

for example, eye color parameters, eyebrow color source decoder 48 for generating the output video signal 50. 

parameters, lip color parameters, and hair color parameters. li of be understood that while the present 

Identifying the skin color parameters in the eye-nose-mouth invention has been described with reference to an illustrative 

region improves the accuracy of the range of color param- 20 embodiment, other arrangements may be apparent to those 

eters that are sampled, since the identification of the ENM of ordinary skill m the art 

region is a strong indication of the presence of a skin area. ^ mvention claimed is: 

Also, computational complexity is advantageously reduced, 1 ^ W^tus for determining skin tone in a color video 

because the ENM region is smaller than the well-fitted the a PP aratus comprising: 

ellipse from which it is derived. 25 a locator for identifying objects of a desired shape from at 

Detection of the eyes-nose-mouth region may also be least a portion of the color video signal; and 

affected when the subject does not look directly at the a detector for analyzing at least one pixel from at least one 

camera, which often occurs for example, in video telecon- of the identified objects of the desired shape to deter- 

ferencing situations. The ENM region detector 52 also mine whether the analyzed pixel has a signal energy 

includes detection of an eyes-nose-mouth region for an input 30 within a predetermined range indicative of a skin area, 

video image where the subject does not directly face the 2. The apparatus of claim 1, wherein the desired shape is 

camera, the subject has facial hair and/or wears eyeglasses. a shape that is likely to contain a skin area. 

The ENM region detector 52 exploits the typical symmetry 3. The apparatus of claim 2, wherein the desired shape has 

of facial features with respect to a longitudinal axis going an arc associated with a human shape, 

through the nose and across the mouth, where the axis of 35 4. The apparatus of claim 3, wherein the desired shape is 

symmetry may be slanted at an angle 6 lf as shown in FIG. elliptical. 

8, with respect to the vertical axis of the image. For such 5. The apparatus of claim 1, wherein the analyzed pixel 

slanted ellipses, the rectangular window 60 is rotated by has a luminance parameter indicative of the skin area, the 

discrete angle values about the center of the window, in luminance parameter comprising an alternating current (AC) 

order to provide robustness in the detection of the eye-nose- 40 signal energy component of the analyzed pixel, 

mouth region. Advantageously, angle 6 a has a value within 6. The apparatus of claim 1, wherein the detector further 

the range of -10 degrees to 10 degrees. samples the analyzed pixel to determine at least one color 

Skin area detector 12 is optionally used in conjunction parameter of the pixel, 

with a video coder/decoder (codec) such as video codec 10. 7. The apparatus of claim 6, wherein the at least one color 

The following explanation discusses the operation of skin 45 parameter is a chrominance parameter, 

area detector 12 with regard to the other component parts of 8. The apparatus of claim 6, wherein the detector further 

video codec 10 as shown in FIG. 1. Video codec 10 includes includes a comparator which compares the determined at 

video coder 22 and video decoder 24, where video coder 22 least one color parameter of the analyzed pixel with a 

is controlled by coding controller 16. For coding operations plurality of color parameters in nonanalyzed pixels of the 

the video codec 10 receives an input video signal 26, which 50 color video signal, to identify the plurality of color param- 

is forwarded to the skin area detector 12 and video coder 22. eters in nonanalyzed pixels which are identical to the 

The skin area detector 12 analyzes the input video signal as determined at least one color parameter of the analyzed 

described above and provides information related to the pixel. 

location of skin areas to the coding controller 16. The video 9. The apparatus of claim 6, wherein a coder generates a 

coder 22 codes the input video signal under the control of the 55 code segment based on the location of the at least one color 

coding controller 16 to generate an output coded bitstream parameter of the analyzed pixel. 

30, wherein the skin areas, identified using the above 10. The apparatus of claim 3, wherein the arc associated 

described skin area detector, are encoded with a higher with the human shape is analyzed to determine whether the 

number of bits than are areas that are not so identified. For shape contains pixels associated with an eyes-nose-mouth 

example, a coding controller, such as coding controller 16, 60 (ENM) region. 

typically encodes and transmits only those discrete cosine 11. The apparatus of claim 10, wherein the pixels not 

transform (DCT) data components, which have a value associated with the eyes-nose-mouth (ENM) region are 

above some threshold value (quantization factor). As an excluded from analysis by the detector. 

illustrative example, assume that an area of 16x16 pixels has 12. A method for determining skin tone in a color video 

data components whose values range from 1 to 16 and that 65 signal, the method comprising the steps of: 

the threshold value was selected to be 8. Then, the coding identifying objects of a desired shape from at least a 

controller will only code those DCT data components whose portion of the color video signal; and 
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analyzing at least one pixel from at least one of the 
identified objects of the desired shape to determine 
whether the analyzed pixel has a signal energy within 
a predetermined range indicative of a skin area. 

13. The method of claim 12, wherein the desired shape is 5 
a shape that is likely to contain a skin area. 

14. The method of claim 13, wherein the desired shape 
has an arc associated with a human shape. 

15. The method of claim 14, wherein the desired shape is 
elliptical. 10 

16. The method of claim 12, wherein the analyzed pixel 
has a luminance parameter indicative of the skin area, the 
luminance parameter comprising an alternating current (AC) 
signal energy component of the analyzed pixel. 

17. The method of claim 12, further comprising the step is 
of sampling the analyzed pixel to determine at least one 
color parameter of pixel. 

' 18. The method of claim 17, wherein the at least one color 
parameter is a chrominance parameter. 

19. The method of claim 17, further comprising the step 20 
of comparing the determined at least one color parameter of 
the analyzed pixel with a plurality of color parameters in 
nonanalyzed pixels of the color video signal, to identify the 
plurality of color parameters in nonanalyzed pixels which 
are identical to the determined at least one color parameter 25 
of the analyzed pixel. 

20. The method of claim 17, further comprising the step 
of generating a code segment based on the location of the at 
least one color parameter of the analyzed pixel. 

21. The method of claim 14, further comprising the step 30 
of analyzing the arc associated with the human shape to 
determine whether the shape contains pixels associated with 

an eyes-nose-mouth (ENM) region. 

22. The method of claim 21, wherein the pixels not 
associated with the eyes-nose-mouth (ENM) region are 35 
excluded from analysis by the detector. 

23. An apparatus for determining a skin area in an input 
color video signal and encoding an area determined to be a 
skin area with a higher number of bits, the apparatus 
comprising: 40 

a shape locator for analyzing at least a portion of the color 
video signal to identify objects of a desired shape; 

a tone detector for analyzing at least one pixel from at 
least one of the identified objects of the desired shape ^ 
to determine whether the analyzed pixel has a lumi- 
nance parameter with a signal energy within a prede- 
termined range of signal energies indicative of a skin 
area; and 

a video controller for receiving an indication that said at 50 
least one identified object has been determined to have 
a luminance parameter with a signal energy indicative 
of a skin area and for controlling a video coder in 
response thereto to code said at least one identified ^ 
object with a higher number of bits. 

24. The apparatus of claim 23 wherein said portion of the 
color video signal comprises a frame and wherein the shape 
locator identifies all objects of a desired shape within the 
frame. ^ 

25. The apparatus of claim 24 comprising: 

a color sampling processor to test the color of said at least 
one pixel. 

26. The apparatus of claim 25 wherein said tone detector 

is further operable to determine non-skin areas and said 65 
video controller eliminates said non-skin areas from testing 
by the color sampling processor. 



141 Bl 

16 

27. The apparatus of claim 23 wherein said signal energy 
is a sum of the squares of the luminance parameter. 

28. The apparatus of claim 23 wherein all of the pixels of 
said analyzed object are sampled to determine a range of 
skin tone values for said analyzed object. 

29. The apparatus of claim 28 wherein the range of skin 
tone values is compared with all tones contained in a video 
image to identify other areas having the same skin tone 
value. 

30. A method for determining a skin area in an input color 
video signal and encoding an area determined to be a skin 
area with a higher number of bits, the method comprising the 
steps of: 

analyzing at least a portion of the color video signal to 
identify objects of a desired shape; 

analyzing at least one pixel from at least one of the 
identified objects of the desired shape to determine 
whether the analyzed pixel has a luminance parameter 
with a signal energy within a predetermined range of 
signal energies indicative of a skin area; and 

utilizing an indication that said at least one identified 
object has been determined to have a luminance param- 
eter with a signal energy indicative of a skin area to 
control video coding so that said at least one identified 
object is coded with a higher number of bits. 

31. The method of claim 30 wherein said portion of the 
color video signal comprises a frame and said first step of 
analyzing further comprises identifying all objects of a 
desired shape within the frame. 

32. The method of claim 31 further comprising the step of: 
testing the color of said at least one pixel. 

33. The method of claim 32 further comprising the steps 
of: 

determining non-skin areas; and 

eliminating said non-skin areas from color testing. 

34. The method of claim 30 wherein said signal energy is 
a sum of the squares of the luminance parameter. 

35. The method of claim 30 wherein all of the pixels of 
said analyzed object are sampled to determine a range of 
skin tone values for said analyzed object. 

36. The method of claim 35 further comprising the step of: 
comparing the range of skin tone values with all tones 

contained in a video image to identify other areas 
having the same skin tone value. 

37. An apparatus for determining a skin area in an input 
color video signal and encoding an area determined to be a 
skin area with a higher number of bits, the apparatus 
comprising: 

means for identifying objects of a desired shape from at 
least a portion of the color video signal; 

means for analyzing at least one pixel from at least one of 
the identified objects of the desired shape to determine 
whether the analyzed pixel has a signal energy within 
a predetermined range indicative of a skin area; and 

means for receiving an indication that said at least one 
identified object has been determined to have a lumi- 
nance parameter with a signal energy indicative of a 
skin area and controlling a video coder in response 
thereto to code said at least one identified object with 
a higher number of bits. 

38. The apparatus of claim 37 wherein said portion of the 
color video signal comprises a frame and wherein the means 
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for identifying objects of a desired shape identifies all 
objects of a desired shape within the frame. 

39. The apparatus of claim 37 further comprising: 
means for testing the color of said at least one pixel. 

40. The apparatus of claim 39 wherein said means for 5 
analyzing to determine whether the analyzed pixel has a 
signal energy within a predetermined range indicative of a 
skin area is further operable for determining non-skin areas 
and said means for receiving and controlling is further 
operable for eliminating said non-skin areas from testing by 10 
the means for testing. 
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41. The apparatus of claim 37 wherein said signal energy 
is a sum of the squares of a luminance parameter for said at 
least one pixel. 

42. The apparatus of claim 37 wherein all of the pixels of 
said analyzed object are sampled to determine a range of 
skin tone values for said analyzed object 

43. The apparatus of claim 42 wherein the range of skin 
tone values is compared with all tones contained in a video 
image to identify other areas having the same skin tone 
value. 



